Search results for "Decision tree learning"
showing 10 items of 13 documents
Comparing Boosting and Bagging for Decision Trees of Rankings
2021
AbstractDecision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference…
A methodology for fire data analysis based on pattern recognition towards the disaster management
2015
The aim of this paper is to investigate a proposed strategy for fire disaster analysis that is implemented based on pattern recognition technique in order to achieve a methodology for disaster management. Since the fire hazard has severe effects onto human and properties, it is essential to predict and possibly prevent it. Almost every fire produces some issues, such as heat, smoke, gas, and flame, which are sensible and measurable via devices or detection systems. The fire behavior is relevant to these issues. In this research, temperature, heat radiation, and visibility (smoke) data of fire that have been obtained from Fire Dynamics Simulator (FDS) are used for analysis. The location of t…
Improving the Competency of Classifiers through Data Generation
2001
This paper describes a hybrid approach in which sub-symbolic neural networks and symbolic machine learning algorithms are grouped into an ensemble of classifiers. Initially each classifier determines which portion of the data it is most competent in. The competency information is used to generated new data that are used for further training and prediction. The application of this approach in a difficult to learn domain shows an increase in the predictive power, in terms of the accuracy and level of competency of both the ensemble and the component classifiers.
The predictive power of game-related statistics for the final result under the rule changes introduced in the men’s world water polo championship: a …
2019
The objectives of this study were (i) to compare water polo game-related statistics by match outcome (winning and losing teams) after the application of the new rules, and (ii) to develop a classif...
Deterministic Linkage as a Preceding Filter for Other Record Linkage Methods
2015
Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the effect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as ‘match' if all values of attributes considered agree exactly, otherwise as ‘nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi–Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data a…
QSAR models for tyrosinase inhibitory activity description applying modern statistical classification techniques: A comparative study
2010
Abstract Cluster analysis (CA), Linear and Quadratic Discriminant Analysis (L(Q)DA), Binary Logistic Regression (BLR) and Classification Tree (CT) are applied on two datasets for description of tyrosinase inhibitory activity from molecular structures. The first set included 701 tyrosinase inhibitors (TI) that are used for performance of inhibitory and non-inhibitory activity and the second one is for potency estimation of active compounds. 2D TOMOCOMD-CARDD atom-based quadratic indices are computed as molecular descriptors. CA is used to “rational” design of training (TS) and prediction set (PS) but it shows of not being adequate as classification technique. On the first data, the overall a…
Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data
2017
AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…
Computational identification of chemical compounds with potential anti-Chagas activity using a classification tree
2021
Chagas disease is endemic to 21 Latin American countries and is a great public health problem in that region. Current chemotherapy remains unsatisfactory; consequently the need to search for new drugs persists. Here we present a new approach to identify novel compounds with potential anti-chagasic action. A large dataset of 584 compounds, obtained from the Drugs for Neglected Diseases initiative, was selected to develop the computational model. Dragon software was used to calculate the molecular descriptors and WEKA software to obtain the classification tree. The best model shows accuracy greater than 93.4% for the training set; the tree was also validated using a 10-fold cross-validation p…
Land cover classification of VHR airborne images for citrus grove identification
2011
Abstract Managing land resources using remote sensing techniques is becoming a common practice. However, data analysis procedures should satisfy the high accuracy levels demanded by users (public or private companies and governments) in order to be extensively used. This paper presents a multi-stage classification scheme to update the citrus Geographical Information System (GIS) of the Comunidad Valenciana region (Spain). Spain is the first citrus fruit producer in Europe and the fourth in the world. In particular, citrus fruits represent 67% of the agricultural production in this region, with a total production of 4.24 million tons (campaign 2006–2007). The citrus GIS inventory, created in…